Multivariate Normal distribution (multivariate_normal)#
The multivariate normal (a.k.a. multivariate Gaussian) is the canonical distribution for continuous, correlated random vectors. It generalizes the univariate normal by replacing the scalar variance with a covariance matrix.
It is foundational in statistics and machine learning: linear regression, Kalman filtering, Gaussian processes, LDA/QDA, PCA, and as a building block for Gaussian mixture models.
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
import scipy
from scipy import stats
from scipy.stats import multivariate_normal
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=5, suppress=True)
rng = np.random.default_rng(42)
print("NumPy ", np.__version__)
print("SciPy ", scipy.__version__)
NumPy 1.26.2
SciPy 1.15.0
Notebook roadmap#
Definition and parameter space
Intuition and relationships
PDF / CDF
Moments and key properties
How parameters change the shape
Likelihood + MLE sketch
NumPy-only sampling
Visualizations (PDF, CDF, samples)
SciPy integration (
scipy.stats.multivariate_normal)Statistical use cases
Pitfalls
Summary
1) Title & Classification#
Item |
Value |
|---|---|
Name |
Multivariate Normal ( |
Type |
Continuous |
Dimension |
\(d \in \mathbb{N}\) (fixed) |
Support |
\(x \in \mathbb{R}^d\) |
Parameters |
mean \(\mu \in \mathbb{R}^d\), covariance \(\Sigma \in \mathbb{S}_{++}^d\) |
Here \(\mathbb{S}_{++}^d\) denotes the set of symmetric positive definite \(d\times d\) matrices.
SciPy note: scipy.stats.multivariate_normal also supports allow_singular=True, which relaxes \(\Sigma\) to be positive semidefinite.
2) Intuition & Motivation#
What it models#
A multivariate normal models a random vector whose uncertainty looks like an ellipsoid in \(\mathbb{R}^d\). The key geometric object is the Mahalanobis distance
which replaces the “\(z\)-score squared” from the univariate normal.
A very useful generative story:
Draw \(Z \sim \mathcal{N}(0, I_d)\) (independent standard normals).
Choose a matrix \(L\) such that \(\Sigma = L L^\top\) (e.g. Cholesky factor).
Set $\(X = \mu + L Z.\)$
Typical real-world use cases#
Measurement noise with correlated sensors / features
Kalman filters and state-space models (Gaussian noise + linear dynamics)
Gaussian processes: any finite set of function values is multivariate normal
LDA/QDA: class-conditional feature distributions are modeled as Gaussian
PCA / factor models: covariance structure is the main object
Relations to other distributions#
Univariate normal is the special case \(d=1\).
Linear transformations preserve normality: if \(X\sim\mathcal{N}(\mu,\Sigma)\), then \(AX+b\) is normal.
Conditionals and marginals of a multivariate normal are again (multi)variate normals.
The quadratic form \(\delta^2(X)\) has a chi-square distribution: \(\delta^2(X) \sim \chi^2_d\).
The sample covariance of Gaussian data is Wishart-distributed.
(A deeper fact: among all continuous distributions with a given mean and covariance, the multivariate normal has maximum entropy.)
3) Formal Definition#
Let \(d\) be the dimension, \(\mu\in\mathbb{R}^d\), and \(\Sigma\in\mathbb{S}_{++}^d\). We write
PDF#
The probability density function (PDF) is
A numerically stable way to work is the log-density:
CDF#
The multivariate CDF is defined by integrating over a rectangle (orthant):
For \(d>1\), there is no general closed form; practical evaluation uses numerical methods.
def _as_2d(x: np.ndarray) -> np.ndarray:
x = np.asarray(x, dtype=float)
if x.ndim == 1:
return x.reshape(1, -1)
if x.ndim != 2:
raise ValueError("x must be a 1D or 2D array")
return x
def _check_mean_cov(mean: np.ndarray, cov: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
mean = np.asarray(mean, dtype=float)
cov = np.asarray(cov, dtype=float)
if mean.ndim != 1:
raise ValueError("mean must be 1D")
if cov.ndim != 2 or cov.shape[0] != cov.shape[1]:
raise ValueError("cov must be a square 2D array")
d = mean.shape[0]
if cov.shape != (d, d):
raise ValueError(f"cov must have shape ({d}, {d})")
if not np.allclose(cov, cov.T, atol=1e-12, rtol=0):
raise ValueError("cov must be symmetric")
return mean, cov
def mvn_logpdf(x: np.ndarray, mean: np.ndarray, cov: np.ndarray) -> np.ndarray:
"""Log-PDF of a multivariate normal using NumPy only (SPD covariance required).
Args:
x: shape (d,) or (n, d)
mean: shape (d,)
cov: shape (d, d), symmetric positive definite
Returns:
logpdf values with shape (n,) (or scalar if x is (d,)).
"""
mean, cov = _check_mean_cov(mean, cov)
x2 = _as_2d(x)
d = mean.shape[0]
if x2.shape[1] != d:
raise ValueError(f"x must have dimension d={d}")
# Cholesky: cov = L L^T (requires SPD)
L = np.linalg.cholesky(cov)
diff = (x2 - mean) # (n, d)
# Solve L y = diff^T => y = L^{-1} diff^T
y = np.linalg.solve(L, diff.T) # (d, n)
maha2 = np.sum(y**2, axis=0) # (n,)
log_det = 2.0 * np.sum(np.log(np.diag(L)))
log_norm = -0.5 * (d * np.log(2.0 * np.pi) + log_det)
out = log_norm - 0.5 * maha2
if np.asarray(x).ndim == 1:
return float(out[0])
return out
def mvn_pdf(x: np.ndarray, mean: np.ndarray, cov: np.ndarray) -> np.ndarray:
return np.exp(mvn_logpdf(x, mean, cov))
def mvn_rvs(
mean: np.ndarray,
cov: np.ndarray,
size: int,
rng: np.random.Generator | None = None,
) -> np.ndarray:
"""Sample from N(mean, cov) using NumPy only (via Cholesky).
Returns an array of shape (size, d).
"""
if rng is None:
rng = np.random.default_rng()
mean, cov = _check_mean_cov(mean, cov)
d = mean.shape[0]
L = np.linalg.cholesky(cov)
z = rng.standard_normal(size=(int(size), d))
return mean + z @ L.T
def mvn_cdf_mc(
x: np.ndarray,
mean: np.ndarray,
cov: np.ndarray,
n: int = 50_000,
rng: np.random.Generator | None = None,
) -> tuple[float, float]:
"""Monte Carlo estimate of F(x) = P(X_1<=x_1,...,X_d<=x_d).
Returns (estimate, standard_error).
"""
if rng is None:
rng = np.random.default_rng()
x = np.asarray(x, dtype=float)
if x.ndim != 1:
raise ValueError("x must be a 1D point")
samples = mvn_rvs(mean, cov, size=int(n), rng=rng)
hits = np.all(samples <= x[None, :], axis=1)
p = float(np.mean(hits))
se = float(np.sqrt(p * (1.0 - p) / n))
return p, se
# Quick spot-check vs SciPy
mu = np.array([0.5, -1.0])
Sigma = np.array([[1.5, 0.4], [0.4, 0.8]])
x_test = np.array([[0.0, 0.0], [1.0, -1.5], [2.0, -0.5]])
ours = mvn_logpdf(x_test, mu, Sigma)
theirs = multivariate_normal(mean=mu, cov=Sigma).logpdf(x_test)
print("max |logpdf_ours - logpdf_scipy| =", float(np.max(np.abs(ours - theirs))))
max |logpdf_ours - logpdf_scipy| = 0.0
4) Moments & Properties#
Let \(X \sim \mathcal{N}(\mu,\Sigma)\) in dimension \(d\).
Moments#
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X]=\mu\) |
Covariance |
\(\mathrm{Cov}(X)=\Sigma\) |
Variance of component \(i\) |
\(\mathrm{Var}(X_i)=\Sigma_{ii}\) |
Variance of linear combo \(a^\top X\) |
\(\mathrm{Var}(a^\top X)=a^\top\Sigma a\) |
Skewness |
0 (all odd central moments are 0) |
Kurtosis |
any 1D marginal has kurtosis 3; Mardia kurtosis \(\beta_{2,d}=d(d+2)\) |
A common multivariate kurtosis measure (Mardia) is
MGF and characteristic function#
For \(t\in\mathbb{R}^d\),
and the characteristic function is
Entropy#
The differential entropy (in nats) is
Key closure properties#
Affine transform: if \(X\sim\mathcal{N}(\mu,\Sigma)\) then \(Y=AX+b\) is normal with \(\mathbb{E}[Y]=A\mu+b\) and \(\mathrm{Cov}(Y)=A\Sigma A^\top\).
Marginals: any subvector of \(X\) is multivariate normal.
Conditionals: \(X_A\mid X_B=b\) is multivariate normal (with closed-form mean/cov).
Uncorrelated = independent (special to the multivariate normal): if \(\mathrm{Cov}(X_i,X_j)=0\) then \(X_i\) and \(X_j\) are independent.
5) Parameter Interpretation#
The mean vector \(\mu\) is a location parameter: it shifts the center of mass.
The covariance matrix \(\Sigma\) controls:
scale (overall spread),
anisotropy (different variance along different directions),
correlation (tilt/rotation of the ellipsoids).
A helpful view is the eigendecomposition:
where columns of \(Q\) are orthonormal eigenvectors and \(\Lambda=\mathrm{diag}(\lambda_1,\ldots,\lambda_d)\). Then:
eigenvectors give the principal axes of the density,
eigenvalues give the variances along those axes.
Increasing an eigenvalue stretches the ellipsoid in that direction; changing correlation rotates the ellipsoid.
# How correlation changes the shape (d=2, unit variances)
mu = np.zeros(2)
rhos = [-0.8, 0.0, 0.8]
x1 = np.linspace(-3.5, 3.5, 160)
x2 = np.linspace(-3.5, 3.5, 160)
X1, X2 = np.meshgrid(x1, x2)
grid = np.column_stack([X1.ravel(), X2.ravel()])
fig = make_subplots(
rows=1,
cols=3,
subplot_titles=[f"correlation ρ = {rho}" for rho in rhos],
horizontal_spacing=0.06,
)
for j, rho in enumerate(rhos, start=1):
Sigma = np.array([[1.0, rho], [rho, 1.0]])
Z = mvn_pdf(grid, mu, Sigma).reshape(X1.shape)
fig.add_trace(
go.Contour(
x=x1,
y=x2,
z=Z,
contours=dict(showlabels=False),
colorscale="Viridis",
line_smoothing=0.8,
showscale=(j == 3),
colorbar=dict(title="pdf") if j == 3 else None,
),
row=1,
col=j,
)
fig.update_xaxes(title_text="x1", row=1, col=j)
fig.update_yaxes(title_text="x2", row=1, col=j)
fig.update_layout(title="Effect of correlation on the PDF (unit variances)", height=360)
fig.show()
6) Derivations#
Below are standard derivations/sketches that are useful to remember.
Expectation and covariance#
Using the generative representation \(X=\mu+LZ\) with \(Z\sim\mathcal{N}(0,I)\) and \(\Sigma=LL^\top\):
For the covariance:
Likelihood (i.i.d. data)#
Let \(x_1,\ldots,x_n\) be i.i.d. from \(\mathcal{N}(\mu,\Sigma)\). The log-likelihood is
The maximum-likelihood estimators (MLEs) are:
(The unbiased sample covariance replaces \(1/n\) by \(1/(n-1)\).)
# Verify the mean/covariance derivation empirically
d = 3
mu_true = np.array([1.0, -2.0, 0.5])
Sigma_true = np.array(
[
[2.0, 0.3, -0.2],
[0.3, 1.0, 0.4],
[-0.2, 0.4, 1.5],
]
)
X = mvn_rvs(mu_true, Sigma_true, size=120_000, rng=rng)
mu_hat = X.mean(axis=0)
cov_hat_mle = np.cov(X, rowvar=False, bias=True) # 1/n
print("mu_true:", mu_true)
print("mu_hat :", mu_hat)
print("\nSigma_true:\n", Sigma_true)
print("\nSigma_hat (MLE, 1/n):\n", cov_hat_mle)
# Compare MLE to SciPy's fit
mu_fit, cov_fit = multivariate_normal.fit(X)
print("\nmax |mu_fit - mu_hat| =", float(np.max(np.abs(mu_fit - mu_hat))))
print("max |cov_fit - cov_hat_mle| =", float(np.max(np.abs(cov_fit - cov_hat_mle))))
# Log-likelihood at the true params vs at the fitted params
ll_true = float(np.sum(multivariate_normal(mean=mu_true, cov=Sigma_true).logpdf(X)))
ll_fit = float(np.sum(multivariate_normal(mean=mu_fit, cov=cov_fit).logpdf(X)))
print("\nlog-likelihood (true params):", ll_true)
print("log-likelihood (fitted MLE) :", ll_fit)
mu_true: [ 1. -2. 0.5]
mu_hat : [ 0.99764 -2.00281 0.50352]
Sigma_true:
[[ 2. 0.3 -0.2]
[ 0.3 1. 0.4]
[-0.2 0.4 1.5]]
Sigma_hat (MLE, 1/n):
[[ 2.01425 0.3019 -0.19628]
[ 0.3019 0.99606 0.39907]
[-0.19628 0.39907 1.49794]]
max |mu_fit - mu_hat| = 0.0
max |cov_fit - cov_hat_mle| = 4.440892098500626e-16
log-likelihood (true params): -564889.0870789137
log-likelihood (fitted MLE) : -564884.8911281398
7) Sampling & Simulation#
NumPy-only algorithm (Cholesky)#
To sample \(X\sim\mathcal{N}(\mu,\Sigma)\):
Compute the Cholesky factor \(L\) of the covariance: \(\Sigma = LL^\top\).
Draw \(Z\in\mathbb{R}^d\) with i.i.d. standard normal entries.
Return \(X = \mu + LZ\).
This works because affine transformations of Gaussians are Gaussian.
Cost:
one Cholesky factorization costs \(\mathcal{O}(d^3)\),
each sample costs \(\mathcal{O}(d^2)\) for the matrix-vector multiply.
If \(\Sigma\) is nearly singular, Cholesky can fail; common fixes include adding a small diagonal “jitter” \(\varepsilon I\).
# Sampling demo: squared Mahalanobis distance should look chi-square(d)
d = 4
mu0 = np.zeros(d)
A = rng.normal(size=(d, d))
Sigma0 = A @ A.T + 0.5 * np.eye(d) # SPD
X = mvn_rvs(mu0, Sigma0, size=50_000, rng=rng)
# Compute delta^2(x) = (x-mu)^T Sigma^{-1} (x-mu) without forming Sigma^{-1}
L = np.linalg.cholesky(Sigma0)
Y = np.linalg.solve(L, (X - mu0).T) # (d, n)
delta2 = np.sum(Y**2, axis=0)
# Compare moments with chi-square(d)
print("E[delta^2] empirical:", float(np.mean(delta2)), "theory:", d)
print("Var[delta^2] empirical:", float(np.var(delta2)), "theory:", 2 * d)
E[delta^2] empirical: 4.007935607274497 theory: 4
Var[delta^2] empirical: 8.065157788886816 theory: 8
8) Visualization#
We’ll focus on \(d=2\) so we can visualize:
PDF as contours
CDF as a heatmap of \(F(x_1,x_2)=\mathbb{P}(X_1\le x_1, X_2\le x_2)\)
Monte Carlo samples as a scatter plot
mu2 = np.array([0.5, -0.5])
Sigma2 = np.array([[1.0, 0.75], [0.75, 1.8]])
# Grid
x1 = np.linspace(-3.5, 4.0, 180)
x2 = np.linspace(-4.5, 3.5, 180)
X1, X2 = np.meshgrid(x1, x2)
grid = np.column_stack([X1.ravel(), X2.ravel()])
# PDF via our NumPy implementation
Z_pdf = mvn_pdf(grid, mu2, Sigma2).reshape(X1.shape)
# Samples
S = mvn_rvs(mu2, Sigma2, size=2_000, rng=rng)
fig = go.Figure()
fig.add_trace(
go.Contour(
x=x1,
y=x2,
z=Z_pdf,
contours=dict(showlabels=False),
colorscale="Viridis",
line_smoothing=0.8,
showscale=True,
name="PDF",
)
)
fig.add_trace(
go.Scatter(
x=S[:, 0],
y=S[:, 1],
mode="markers",
marker=dict(size=4, color="rgba(0,0,0,0.35)"),
name="samples",
)
)
fig.update_layout(
title="Multivariate normal: PDF contours + Monte Carlo samples (d=2)",
xaxis_title="x1",
yaxis_title="x2",
legend=dict(orientation="h"),
)
fig.show()
# CDF heatmap using SciPy's multivariate_normal.cdf (numerical evaluation)
rv2 = multivariate_normal(mean=mu2, cov=Sigma2)
Z_cdf = rv2.cdf(grid).reshape(X1.shape)
fig = go.Figure(
data=go.Heatmap(
x=x1,
y=x2,
z=Z_cdf,
colorscale="Blues",
colorbar=dict(title="F(x1,x2)"),
)
)
fig.update_layout(
title="Multivariate normal: CDF heatmap (d=2)",
xaxis_title="x1",
yaxis_title="x2",
)
fig.show()
# A quick MC check at one point
x_point = np.array([0.0, 0.0])
p_mc, se_mc = mvn_cdf_mc(x_point, mu2, Sigma2, n=100_000, rng=rng)
p_scipy = float(rv2.cdf(x_point))
print("F(0,0) SciPy:", p_scipy)
print("F(0,0) MC :", p_mc, "+/-", 2 * se_mc)
F(0,0) SciPy: 0.270207507991013
F(0,0) MC : 0.27102 +/- 0.0028111788246214433
9) SciPy Integration (scipy.stats.multivariate_normal)#
SciPy provides the multivariate normal as scipy.stats.multivariate_normal.
rv = scipy.stats.multivariate_normal(mean=mu, cov=Sigma)
Common methods:
pdf,logpdfcdf(numerical)rvsentropyfit(MLE for mean and covariance)
Note: pdf can underflow in moderate dimension; prefer logpdf for likelihood work.
mu = np.array([1.0, 2.0])
Sigma = np.array([[2.0, -0.3], [-0.3, 1.0]])
rv = multivariate_normal(mean=mu, cov=Sigma)
pts = np.array([[1.0, 2.0], [0.0, 0.0], [2.0, 3.0]])
print("pdf :", rv.pdf(pts))
print("logpdf:", rv.logpdf(pts))
print("cdf :", rv.cdf(pts))
print("entropy:", float(rv.entropy()), "nats")
samples = rv.rvs(size=5_000, random_state=rng)
# Fit parameters (MLE)
mu_fit, cov_fit = multivariate_normal.fit(samples)
print("\nmu_fit:", mu_fit)
print("cov_fit:\n", cov_fit)
# Compare to our NumPy-only logpdf
ours = mvn_logpdf(pts, mu, Sigma)
theirs = rv.logpdf(pts)
print("\nmax |logpdf_ours - logpdf_scipy| =", float(np.max(np.abs(ours - theirs))))
pdf : [0.11516 0.00797 0.04488]
logpdf: [-2.16143 -4.83159 -3.10384]
cdf : [0.21598 0.00246 0.6249 ]
entropy: 3.1614286874386144 nats
mu_fit: [0.98169 2.02102]
cov_fit:
[[ 1.9965 -0.30165]
[-0.30165 0.98532]]
max |logpdf_ours - logpdf_scipy| = 4.440892098500626e-16
10) Statistical Use Cases#
Hypothesis testing#
Mean vector testing: Hotelling’s \(T^2\) tests \(H_0: \mu=\mu_0\) for i.i.d. Gaussian data.
Covariance testing: tests about \(\Sigma\) (equality, sphericity) often use Wishart theory.
Normality checks: multivariate normality can be assessed with Q–Q plots of Mahalanobis distances (should be \(\chi^2_d\)).
Bayesian modeling#
A multivariate normal is a natural prior for a vector of parameters.
With a Gaussian likelihood and known covariance, the posterior for the mean is again Gaussian.
For unknown mean and covariance, the conjugate prior is Normal–Inverse-Wishart.
Generative modeling#
Gaussian mixture models (GMMs) approximate complex densities by mixing Gaussians.
Many latent-variable models assume Gaussian latent variables (e.g., factor analysis, VAEs).
# Hotelling's T^2 test demo: H0: mu = mu0
# Under H0: (n-d)/(d*(n-1)) * T^2 ~ F_{d, n-d}
n = 60
d = 3
mu_true = np.array([0.2, -0.1, 0.3])
Sigma_true = np.array(
[
[1.2, 0.2, 0.0],
[0.2, 1.0, 0.4],
[0.0, 0.4, 1.5],
]
)
X = mvn_rvs(mu_true, Sigma_true, size=n, rng=rng)
mu0 = np.zeros(d) # null hypothesis
xbar = X.mean(axis=0)
S_unbiased = np.cov(X, rowvar=False, bias=False) # 1/(n-1)
# Solve S^{-1}(xbar-mu0) without explicit inverse
diff = (xbar - mu0).reshape(-1, 1)
sol = np.linalg.solve(S_unbiased, diff)
T2 = float(n * (diff.T @ sol))
F_stat = (n - d) / (d * (n - 1)) * T2
p_value = float(stats.f.sf(F_stat, dfn=d, dfd=n - d))
print("T^2 statistic:", T2)
print("F statistic :", F_stat)
print("p-value :", p_value)
T^2 statistic: 8.301062635065886
F statistic : 2.673223560444946
p-value : 0.055836249807211434
/tmp/ipykernel_1032085/3864252406.py:24: DeprecationWarning:
Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
# Multivariate normality check: Mahalanobis-distance Q-Q vs chi-square
n = 600
d = 2
mu_true = np.array([0.0, 0.0])
Sigma_true = np.array([[1.0, 0.6], [0.6, 1.3]])
X = mvn_rvs(mu_true, Sigma_true, size=n, rng=rng)
mu_hat = X.mean(axis=0)
S_unbiased = np.cov(X, rowvar=False, bias=False)
# delta^2_i = (x_i-mu_hat)^T S^{-1} (x_i-mu_hat)
L = np.linalg.cholesky(S_unbiased)
Y = np.linalg.solve(L, (X - mu_hat).T)
delta2 = np.sum(Y**2, axis=0)
delta2_sorted = np.sort(delta2)
q = (np.arange(1, n + 1) - 0.5) / n
chi2_q = stats.chi2.ppf(q, df=d)
maxv = float(max(delta2_sorted.max(), chi2_q.max()))
fig = go.Figure()
fig.add_trace(go.Scatter(x=chi2_q, y=delta2_sorted, mode="markers", name="data"))
fig.add_trace(go.Scatter(x=[0, maxv], y=[0, maxv], mode="lines", name="y = x"))
fig.update_layout(
title="Mahalanobis-distance Q-Q plot (should be ~ linear under MVN)",
xaxis_title="Chi-square quantiles (df=d)",
yaxis_title="Sorted Mahalanobis distances (delta^2)",
)
fig.show()
# Bayesian update for the mean (known covariance):
# Prior: mu ~ N(m0, V0)
# Likelihood: x_i | mu ~ N(mu, Sigma)
# Posterior: mu | X ~ N(mn, Vn)
d = 2
Sigma = np.array([[1.0, 0.4], [0.4, 1.5]])
m0 = np.array([0.0, 0.0])
V0 = 4.0 * np.eye(d) # diffuse prior
mu_true = np.array([1.0, -1.0])
X = mvn_rvs(mu_true, Sigma, size=40, rng=rng)
# Work in precision form
Prec0 = np.linalg.inv(V0)
PrecL = np.linalg.inv(Sigma)
Vn = np.linalg.inv(Prec0 + X.shape[0] * PrecL)
mn = Vn @ (Prec0 @ m0 + PrecL @ X.sum(axis=0))
print("prior mean:", m0)
print("posterior mean:", mn)
print("true mean:", mu_true)
print("posterior cov:\n", Vn)
prior mean: [0. 0.]
posterior mean: [ 0.93823 -0.82923]
true mean: [ 1. -1.]
posterior cov:
[[0.02482 0.00985]
[0.00985 0.03713]]
11) Pitfalls#
Invalid covariance: \(\Sigma\) must be symmetric positive (semi)definite. Small asymmetries from floating-point noise can break factorization.
Near-singularity: Cholesky may fail if \(\Sigma\) is ill-conditioned. A common fix is adding jitter: \(\Sigma \leftarrow \Sigma + \varepsilon I\).
Avoid explicit matrix inverses: compute quadratic forms with solves (as we did) for stability.
Underflow in
pdf: in moderate/high dimension, densities can be tiny; uselogpdffor likelihoods.Interpreting correlation: zero correlation implies independence only for jointly Gaussian variables.
CDF in high dimension: multivariate CDF evaluation can become expensive or less accurate as \(d\) grows.
12) Summary#
multivariate_normalis a continuous distribution on \(\mathbb{R}^d\) with parameters mean \(\mu\) and covariance \(\Sigma\).Its density is controlled by the Mahalanobis distance \((x-\mu)^\top\Sigma^{-1}(x-\mu)\).
Key formulas: \(\mathbb{E}[X]=\mu\), \(\mathrm{Cov}(X)=\Sigma\), \(M_X(t)=\exp(t^\top\mu + \tfrac{1}{2}t^\top\Sigma t)\), and \(H(X)=\tfrac{1}{2}\log((2\pi e)^d|\Sigma|)\).
Sampling is easy via Cholesky: \(X=\mu+LZ\) with \(Z\sim\mathcal{N}(0,I)\).
SciPy’s
scipy.stats.multivariate_normalprovidespdf/logpdf/cdf/rvs/fit.
References#
SciPy docs:
scipy.stats.multivariate_normalAny standard multivariate statistics text (e.g., Anderson, An Introduction to Multivariate Statistical Analysis)